Skip to content

Additions to wiki - Model Interpretation Documentation#75

Open
TimCookCountyDS wants to merge 33 commits into
masterfrom
tim_wiki_model_interp
Open

Additions to wiki - Model Interpretation Documentation#75
TimCookCountyDS wants to merge 33 commits into
masterfrom
tim_wiki_model_interp

Conversation

@TimCookCountyDS

Copy link
Copy Markdown

No description provided.

Save link in readme
@wrridgeway

Copy link
Copy Markdown
Member

Could we resolve merge conflicts and get at least a tiny description for this PR before review?

@Damonamajor

Copy link
Copy Markdown
Contributor

My main thought is if we want to include key aspects of this in the checklist?

@TimCookCountyDS

Copy link
Copy Markdown
Author

My main thought is if we want to include key aspects of this in the checklist?

@Damonamajor - definitely aligned with that sentiment- My thought is we could just link this document in the checklist- with reference to specific sections? I can go ahead and do that, and then call this finished.

@TimCookCountyDS TimCookCountyDS linked an issue May 27, 2026 that may be closed by this pull request
4 tasks
@ccao-jardine ccao-jardine self-requested a review May 27, 2026 15:27
Comment thread SOPs/Model-Evaluation-ML-metrics.md Outdated

### A. Balance Tests

*(See the "Statistical Tests" section of the model performance report.)* In a perfectly matched sample, no feature would predict inclusion/exclusion of a property in the sales-sample. Any feature that predicts inclusion in the sales set at a level greater than chance (statistical significance) suggests that this feature is over-or under-represented in the sample and will likely bias your results. (This is especially the case for features that also turn out to have high shap values in your results). To check this, we run a simple logistic regression predicting the likelihood-of-a-sale, given a property's features. The resulting p values (for each feature in the report) tells you that a feature predicts inclusion in the sample at a level greater than expected-due-to-chance, while the Beta value gives you a relative sense of the weight (importance) and direction (include vs exclude) of that feature. (In our report, asterisks, represent statistically significant predictors). (Low p-values suggest statistical significance, high magnitudes for the Betas suggest a large impact). When a feature is predictive of inclusion in the sample, this means that your sample is likely biased towards properties with this feature, and may thus value these, or other properties inaccurately.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also like the note of where we can find this. It may be a bit too nitty to do this for every section, but maybe under the big headers, note which sections of reports we can find the different interpretations.

@ccao-jardine ccao-jardine left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is solid, thanks to everyone who has written and pitched in to review! I'm here to add my own comments. Nitpicks are optional; anything not marked as a nitpick I'd like to discuss or resolve.

Comment thread SOPs/Model-Evaluation-ML-metrics.md Outdated
Comment thread SOPs/Model-Evaluation-ML-metrics.md Outdated
Comment thread SOPs/Model-Evaluation-ML-metrics.md Outdated
Comment thread SOPs/Model-Evaluation-ML-metrics.md Outdated
Comment thread SOPs/Model-Evaluation-ML-metrics.md Outdated
Comment thread SOPs/Model-Evaluation-ML-metrics.md Outdated
Comment thread SOPs/Model-Evaluation-ML-metrics.md Outdated
Comment thread SOPs/Model-Evaluation-ML-metrics.md Outdated
Comment thread SOPs/Model-Evaluation-ML-metrics.md Outdated
Comment thread SOPs/Model-Evaluation-ML-metrics.md
TimCookCountyDS and others added 12 commits June 8, 2026 14:43
Co-authored-by: Nicole Jardine <138712135+ccao-jardine@users.noreply.github.com>
Co-authored-by: Nicole Jardine <138712135+ccao-jardine@users.noreply.github.com>
Co-authored-by: Nicole Jardine <138712135+ccao-jardine@users.noreply.github.com>
Co-authored-by: Nicole Jardine <138712135+ccao-jardine@users.noreply.github.com>
Co-authored-by: Nicole Jardine <138712135+ccao-jardine@users.noreply.github.com>
Co-authored-by: Nicole Jardine <138712135+ccao-jardine@users.noreply.github.com>
corrected inaccurate links for lgbm missingness handling
Co-authored-by: Nicole Jardine <138712135+ccao-jardine@users.noreply.github.com>
fixed progress and poverty link
Co-authored-by: Nicole Jardine <138712135+ccao-jardine@users.noreply.github.com>
Co-authored-by: Nicole Jardine <138712135+ccao-jardine@users.noreply.github.com>
@TimCookCountyDS

Copy link
Copy Markdown
Author

@ccao-jardine - feedback incorporated. Let me know if there's anything else needed before I merge?

@wrridgeway wrridgeway left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I'll do another review once we've added something like a terms section to make this a bit easier to read through.

Comment on lines +5 to +16
**Overview:**

1. Assessing how representative your sales sample is of the assessment set.
- a. Balance tests
- b. Visual inspection
- c. Not missing at random
- d. Domain specific approach

2. Noting any real-world housing market changes that may impact your model, and/or interactions between data and model that may affect your results (model drift, data drift).

3. Interpreting model performance (evaluating machine learning and assessment metrics).

@wrridgeway wrridgeway Jun 10, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Overview:**
1. Assessing how representative your sales sample is of the assessment set.
- a. Balance tests
- b. Visual inspection
- c. Not missing at random
- d. Domain specific approach
2. Noting any real-world housing market changes that may impact your model, and/or interactions between data and model that may affect your results (model drift, data drift).
3. Interpreting model performance (evaluating machine learning and assessment metrics).

There is an "Outline" button next to markdown files that already provides this feature in a really clean way:

Image

Or, if we're committed to this outline, i'd link to the sections through it using section links.

Comment thread SOPs/Model-Evaluation-ML-metrics.md Outdated

@wrridgeway wrridgeway Jun 10, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would replace this outline with a "terms" section and then try to clean up the constant switching between population, sample, sales, and assessment. It's a lot of parentheses and extra language that we could get out of the way super quick and then use a couple small, consistent terms throughout. I tried to clean this up in the word doc but perhaps I made it worse.

Useful terms

  • Sample: the universe of parcel sales we use to train and test our model
  • Population: the universe of parcels that the model needs to value

etc...

@TimCookCountyDS TimCookCountyDS Jun 25, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great call. I think this is actually really important to be clear on- (population, sample, sales, and assessment.) - as it can be a source of confusion when discussing different model outputs and types of evaluation (especially with regard to differences between ml evaluation and domain specific evaluation).

@ccao-jardine ccao-jardine mentioned this pull request Jun 17, 2026
4 tasks
Co-authored-by: William Ridgeway <10358980+wrridgeway@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Additions to Model template for model scoring and interpretation

4 participants